In probability and statistics, a Bernoulli process is a finite or infinite sequence of binary random variables, so it is a discrete-time stochastic process that takes only two values, canonically 0 and 1. The component Bernoulli variables Xi are identical and independent. Prosaically, a Bernoulli process is repeated coin flipping, possibly with an unfair coin with consistant unfairness.
Every variable Xi in the sequence may be associated with a Bernoulli trial or experiment. They all have the same Bernoulli distribution.
The problem of determining the process, given only a limited sample of the Bernoulli trials, may be called the problem of checking if a coin is fair.
Contents |
A Bernoulli process is a finite or infinite sequence of independent random variables X1, X2, X3, ..., such that
In other words, a Bernoulli process is a sequence of independent identically distributed Bernoulli trials.
Independence of the trials implies that the process is memoryless. Given that the probability p is known, past outcomes provide no information about future outcomes. (If p is unknown, however, the past informs about the future indirectly, through inferences about p.)
If the process is infinite, then from any point the future trials constitute a Bernoulli process identical to the whole process, the fresh-start property.
The two possible values of each Xi are often called "success" and "failure". Thus, when expressed as a number 0 or 1, the outcome may be called the number of successes on the ith "trial".
Two other common interpretations of the values are true or false and yes or no. Under any interpretation of the two values, the individual variables Xi may be called Bernoulli trials with parameter p.
In many applications time passes between trials, as the index i increases. In effect, the trials X1, X2, ... Xi, ... happen at "points in time" 1, 2, ..., i, .... That passage of time and the associated notions of "past" and "future" are not necessary, however. Most generally, any Xi and Xj in the process are simply two from a set of random variables indexed by {1, 2, ..., n} or by {1, 2, 3, ...}, the finite and infinite cases.
Several random variables and probability distributions beside the Bernoullis may be derived from the Bernoulli process:
The negative binomial variables may be interpreted as random waiting times.
The Bernoulli process can be formalized in the language of probability spaces as a random sequence in {0, 1}, a single random variable.
A Bernoulli process is then a probability triple and a random variable X mapping Ω into the set {0, 1}N such that for every , one has with probability p and with probability 1 − p.
Where there is no possible confusion, Xi(ω) may be written Xi.
The term Bernoulli sequence is often used informally to refer to a realization of a Bernoulli process. However, the term has an entirely different formal definition as given below.
Suppose a Bernoulli process formally defined as a single random variable (see preceding section). For every there is a sequence of integers
called the Bernoulli sequence associated with the Bernoulli process. For example, if ω represents a sequence of coin flips, then the associated Bernoulli sequence is the list of natural numbers or time-points for which the coin toss outcome is heads.
So defined, a Bernoulli sequence is also a random subset of the index set, the natural numbers .
Almost all Bernoulli sequences are ergodic sequences.
From any Bernoulli process one may derive a Bernoulli process with p = 1/2 by the von Neumann extractor, the earliest randomness extractor, which actually extracts uniform randomness.
Represent the observed process as a sequence of zeroes and ones, or bits, and group that input stream in non-overlapping pairs of successive bits, such as (11)(00)(10)... . Then for each pair,
This table summarizes the computation.
input | output |
---|---|
00 | discard |
01 | 0 |
10 | 1 |
11 | discard |
In the output stream 0 and 1 are equally likely, as 10 and 01 are equally likely in the original, both having probability pq = qp. This extraction of uniform randomness does not require the input trials to be independent, only uncorrelated. More generally, it works for any exchangeable sequence of bits: all sequences that are finite rearrangements are equally likely.
The Von Neumann extractor uses two input bits to produce either zero or one output bits, so the output is shorter than the input by a factor of at least 2. On average the computation discards proportion p2 + (1 − p)2 of the input pairs, or proportion p2 + q2, which is near one when p is near zero or one.
The discard of input pairs is at least proportion 1/2, the minimum which occurs where p = 1/2 for the original process. In that case the output stream is 1/4 the length of the input on average.
The generalization of the Bernoulli process to more than two possible outcomes is called the Bernoulli scheme.